Goto

Collaborating Authors

 kevin jamieson



fair_active_learning_neurips22 (2)

Romain Camilleri

Neural Information Processing Systems

Algorithm 1 BestSafe ArmIdentification ( BESIDE) 1: input: tolerance , confidence 2: dlog (20 )e, b i,0safe(z) 0, b 0(z) 0forallz 2 Z 3: for` =1 ,2,..., do 4: ` 20 2 ` Figure 7: Halfcircledataset.Figure 8: PrecisionFigure 9: Recall




Active Learning with Safety Constraints

Camilleri, Romain, Wagenmaker, Andrew, Morgenstern, Jamie, Jain, Lalit, Jamieson, Kevin

arXiv.org Machine Learning

Active learning methods have shown great promise in reducing the number of samples necessary for learning. As automated learning systems are adopted into real-time, real-world decision-making pipelines, it is increasingly important that such algorithms are designed with safety in mind. In this work we investigate the complexity of learning the best safe decision in interactive environments. We reduce this problem to a constrained linear bandits problem, where our goal is to find the best arm satisfying certain (unknown) safety constraints. We propose an adaptive experimental design-based algorithm, which we show efficiently trades off between the difficulty of showing an arm is unsafe vs suboptimal. To our knowledge, our results are the first on best-arm identification in linear bandits with safety constraints. In practice, we demonstrate that this approach performs well on synthetic and real world datasets.


Experimental Design for Regret Minimization in Linear Bandits

Wagenmaker, Andrew, Katz-Samuels, Julian, Jamieson, Kevin

arXiv.org Machine Learning

In this paper we propose a novel experimental design-based algorithm to minimize regret in online stochastic linear and combinatorial bandits. While existing literature tends to focus on optimism-based algorithms--which have been shown to be suboptimal in many cases--our approach carefully plans which action to take by balancing the tradeoff between information gain and reward, overcoming the failures of optimism. In addition, we leverage tools from the theory of suprema of empirical processes to obtain regret guarantees that scale with the Gaussian width of the action set, avoiding wasteful union bounds. We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime. In the combinatorial semi-bandit setting, we show that our algorithm is computationally efficient and relies only on calls to a linear maximization oracle. In addition, we show that with slight modification our algorithm can be used for pure exploration, obtaining state-of-the-art pure exploration guarantees in the semi-bandit setting. Finally, we provide, to the best of our knowledge, the first example where optimism fails in the semi-bandit regime, and show that in this setting our algorithm succeeds.


The News on Auto-tuning

#artificialintelligence

Ed. Note: this post is in my voice, but it was co-written with Kevin Jamieson. Kevin provided the awesome plots too. It's all the rage in machine learning these days to build complex, deep pipelines with thousands of tunable parameters. Now, I don't mean parameters that we learn by stochastic gradient descent. But I mean architectural concerns, like the value of the regularization parameter, the size of a convolutional window, or the breadth of a spatio-temporal tower of attention.